FastBLAST: Homology Relationships for Millions of Proteins

نویسندگان

  • Morgan N. Price
  • Paramvir S. Dehal
  • Adam P. Arkin
چکیده

BACKGROUND All-versus-all BLAST, which searches for homologous pairs of sequences in a database of proteins, is used to identify potential orthologs, to find new protein families, and to provide rapid access to these homology relationships. As DNA sequencing accelerates and data sets grow, all-versus-all BLAST has become computationally demanding. METHODOLOGY/PRINCIPAL FINDINGS We present FastBLAST, a heuristic replacement for all-versus-all BLAST that relies on alignments of proteins to known families, obtained from tools such as PSI-BLAST and HMMer. FastBLAST avoids most of the work of all-versus-all BLAST by taking advantage of these alignments and by clustering similar sequences. FastBLAST runs in two stages: the first stage identifies additional families and aligns them, and the second stage quickly identifies the homologs of a query sequence, based on the alignments of the families, before generating pairwise alignments. On 6.53 million proteins from the non-redundant Genbank database ("NR"), FastBLAST identifies new families 25 times faster than all-versus-all BLAST. Once the first stage is completed, FastBLAST identifies homologs for the average query in less than 5 seconds (8.6 times faster than BLAST) and gives nearly identical results. For hits above 70 bits, FastBLAST identifies 98% of the top 3,250 hits per query. CONCLUSIONS/SIGNIFICANCE FastBLAST enables research groups that do not have supercomputers to analyze large protein sequence data sets. FastBLAST is open source software and is available at http://microbesonline.org/fastblast.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

In Silico and in Vitroinvestigations on cry4aand cry11atoxins of Bacillus thuringiensis var Israelensis

In the present study we attempted to correlate the structure and function of the cry11a (72 kDa) and cry4a (135 kDa) proteins of Bacillus thuringiensis var israelensis. Homology modeling and secondary structure predictions were done to locate most probable regions for finding helices or strands in these proteins. The JPRED (JPRED consensus secondary structure prediction server) secondary struct...

متن کامل

Computational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)

Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...

متن کامل

Prediction of the P. falciparum Target Space Relevant to Malaria Drug Discovery

Malaria is still one of the most devastating infectious diseases, affecting hundreds of millions of patients worldwide. Even though there are several established drugs in clinical use for malaria treatment, there is an urgent need for new drugs acting through novel mechanisms of action due to the rapid development of resistance. Resistance emerges when the parasite manages to mutate the sequenc...

متن کامل

Molecular Analysis of A2-genes Encoding Stage-specific S Antigen-like Proteins among Isolates from Iranian Cutaneous and Visceral Leishmaniasis

Objective(s) Leishmania can lead to a broad spectrum of diseases, collectively known as leishmaniasis. The A2 gene/ protein family could be one of the most eligible candidate factors of virulence in visceral leishmaniasis (VL). The previous results confirmed that in Leishmania infantum, several A2 proteins are abundantly expressed by the amastigote, but not the promastigote stage. As there are...

متن کامل

The relationships among acute phase response proteins, cytokines, and enzymes during ovine experimental endotoxemia

BACKGROUND: The acute phase response is beneficial to theanimal in restoring homeostasis, and measuring the circulatingacute phase proteins, cytokines, and enzymes can be used toevaluate the innate immune system's responses to invader agentssuch as bacterial lipopolysaccharide. Measurement of theseparameters has shown to be useful as diagnostic and prognosticmarkers in animal endotoxemia. OBJEC...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PLoS ONE

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2008